Data Structures
In this section we will explore the various data structures that python has to offer. A data structure is an arrangement of data in a specific format that can be operated on and shared amongst our code to fulfill various requirements.
Lists
A list, like an array in other languages, is a container for a list of non-unique data items, those items can be anything from literals, variables, other lists or data structures. The type of items in a list can be mixed and nested, each one can contain any of the legal data types or structures provided in Python. Generally, lists are used a lot in Python, so it is well worth the time to look at lists in detail here.
There is no specific order in a list other than the order in which the items are added.
A list is constructed using []
square brackets and each item in the list is separated by a comma. An empty list can be declared using the
empty square brackets. Below are a couple of simple examples of lists.
cars = [] # Empty list declaration using [] notation
cars = ["Toyota", "Audi", "BMW", "Mercedes"]
print(cars)
programmers = list() # Empty list declaration using the list() function
name = "Richard"
a_list = [name, "Programmer", "English"]
name = "Tayfun"
b_list = [name, "Programmer", "Turkish"]
# A list of lists
programmers = [a_list, b_list]
print(programmers)
# 3 by 3 matrix
matrix = [[0, 1, 2], [2, 3, 5], [5, 7, 9]]
Note: For readability spaces are used after each comma separating an item in a data structure
Operations on lists
There are many ways to handle different operations on lists using list methods (functions) that handle various operation on lists.
Just like arrays in most other languages, lists are indexed by 0, so the first element of a list is always at position 0. The last element being at the length of the list - 1.
cars = ["Toyota", "Audi", "BMW", "Mercedes"]
list_length = len(cars)
print(list_length)
print(cars[0])
# The following print the same.
print(cars[3])
print(cars[list_length-1])
# When a negative index is used the list access starts from the end
print(cars[-1])
print(cars[-2])
You can add to a list using the built-in method append()
. This does what it says on the tin, appends to the end of a list.
cars = ["Toyota", "Audi", "BMW", "Mercedes"]
cars.append("Land Rover")
print(cars)
You can also add to the front of a list or indeed any position in the list that is a valid index of the existing list. You can do this with the
insert()
method.
cars = ["Toyota", "Audi", "BMW", "Mercedes"]
cars.insert(0, "Land Rover")
print(cars)
cars.insert(3, "Honda")
print(cars)
To add two lists together you can use the standard +
operator
german_cars = ['Audi', 'BMW', 'Mercedes']
other_cars = ['Land Rover', 'Toyota', 'Honda']
cars = german_cars + other_cars
print(cars)
You can also extend the list by using the extend
method
german_cars = ['Audi', 'BMW', 'Mercedes']
other_cars = ['Land Rover', 'Toyota', 'Honda']
cars = german_cars
cars.extend(other_cars)
print(cars)
You can count the number of a specific item in a list (Duplicated Items) using the count
method.
cars = ["Toyota", "Audi", "BMW", "Mercedes", "BMW"]
print(cars.count("BMW"))
To delete an item at a specific position in a list you can use the del
method with the index of the item you wish to delete.
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
del cars[0]
print(cars)
To delete all the items in a list you can use the clear()
method. This will result in an empty list.
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.clear()
print(cars)
To delete an item from the end of the list you can use the pop()
method
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.pop()
print(cars)
To remove an item by its value, you can use the remove
method
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.remove('Land Rover')
print(cars)
You can get a slice of a list (a sublist) using the slice
function:
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
germanCars = cars[2:5] # or cars[2:]
print(germanCars)
Slices can also be used to update and delete part of a list
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars[2:4] = ["Tofas", "Honda"]
print(cars)
del cars[2:4]
print(cars)
You can reverse the order of items in a list using the reverse()
method.
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.reverse()
print(cars)
You can find an items index in a list by using the index
method
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
print(cars.index("Mercedes"))
To sort the items in a list you can use the sort()
method
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.sort()
print(cars)
The sort()
function will by default sort in ascending order, but you can also sort in descending order by using the reverse
parameter in the sort()
function
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars.sort(reverse=True)
print(cars)
To copy a list to a new independent list, you should use the copy
method. You can use the equals operator =
, however this will make a reference copy and if you update the copy, the original list will
also apply those updates.
# Creates a reference copy of the original list - all updates on the reference will apply to the original
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars2 = cars
cars2.remove('Audi')
cars2.append("Volvo")
print('reference copy -->', cars2)
print('original -->', cars)
# Creates a new copy of the original list - all updates on the new list will NOT apply to the original
cars = ['Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda']
cars2 = cars.copy()
cars2.remove('Audi')
cars2.append('Volvo')
print('copy -->', cars2)
print('original -->', cars)
Tuples
Tuples are lists of non-unique and unordered items that are initialized once and cannot be changed or deleted later (immutable). We use tuples to group information that will not change over the lifetime that a tuple exists. Tuples containing literals and not variables are said to be generally faster than lists.
Unlike lists, tuples use ()
rounded brackets for reference. Like lists tuples are 0 indexed.
Operating on Tuples
You can declare an empty tuple using empty ()
brackets or using the tuple()
function.
You can add to a tuple by using the +
operator and a value in tuple format.
cars = ("Bentley", )
car1 = "Audi"
cars = cars + (car1, 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
print(cars)
# Alternative empty tuple
a_tuple = tuple()
b_tuple = (1, 2)
c_tuple = a_tuple + b_tuple
d_tuple = c_tuple + (3,)
print(a_tuple)
print(b_tuple)
print(c_tuple)
print(d_tuple)
Notice the comma after the number 3 in (3,). This is required when adding a single item to a tuple.
You can get the length of a tuple using the len()
function
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
print(len(cars))
And just like lists, you can use positive indexing starting from 0 to the last index (length-1) and negative indexing to get the items starting from the last item in a tuple.
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
print(cars[0])
print(cars[-1])
If you need to update a tuple item after it has been set, you will need to convert it to a list, perform the update and then convert it back to a tuple. But for what it's worth if you need to update items in a tuple then just use a list instead.
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
cars = list(cars)
cars[0] = "Mini"
cars = tuple(cars)
print(cars)
You can also unpack a tuple to a list and visa versa using the *
operator
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
cars = [*cars]
cars[0] = "Mini"
cars = (*cars,)
print(cars)
You can also unpack a tuple into variables
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
(car1, car2, car3, car4, car5, car6) = cars
print(car1)
print(car2)
print(car3)
print(car4)
print(car5)
print(car6)
Like lists, you can count the occurrence of an item in a tuple using the count
method
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda', "BMW")
print(cars.count("BMW"))
And again, as in lists you can find the index of a specific item in a tuple by using the index
method
cars = ('Audi', 'BMW', 'Mercedes', 'Land Rover', 'Toyota', 'Honda')
print(cars.index("Mercedes"))
Dictionaries
Dictionaries in Python are collections of key-value pairs. Where a ‘key’ is some literal identifier (string, integer, tuple), associated with another object, simple or complex, which constitutes the value.
{'a_key': 'some_value'}
Above some_value
may be any legitimate, literal, variable or data structure.
Dictionaries are unordered, mutable and iterable. That is they do not maintain an order of items and every item is changeable. Dictionaries do not maintain a position index.
Dictionaries are often used to group related variables into structured objects. Which can be copied to other objects. Python dictionary objects have similar qualities to JSON (Javascript Object Notation) objects but of course Python has its own way of representing the underlying dictionary structure, storing its data and a unique set of methods for access and operations.
Dictionaries can be declared as empty using either {}
curly brackets or the function dict()
, or by instantiating with a set of key-value pairs.
courses = {}
courses = dict()
courses['teachers'] = ["Richard", "Tayfun"]
print(courses)
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"}}
print(auto_club)
You can see by the declaration of 'auto_club' that the values for each key are dictionaries themselves. This is common practice, developers often use a dictionary to contain related data for a key. It's possible to have as many nested dictionaries as you want in a keys value, but that would be impractical if the nested structures become too complex. The concept of KISS (Keep It Simple Stupid) is key to good development.
Operating on dictionaries
To access key value pairs you use the [key]
square brackets containing the key which will return the data value
associated with that key and if a keys value is a dictionary you can specify a key of that dictionary to get that keys value.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"}}
# Get the key 'Land Rover' value
print(auto_club['Land Rover'])
# Get the model value from the values dictionary
print(auto_club['Audi']['model'])
You can also get a value from a key-value pair by using the get()
method
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"}}
# Get the key 'Land Rover' value
print(auto_club.get('Land Rover'))
Note: the advantage of getting a key value using the get method is that idf the key does not exist in the dictionary it will return None and will not throw an error. Therefore enabling you to test if a key is in a dictionary.
It's easy to add a new key-value pair to a dictionary, you can do it simply by assigning a new key in []
with an equals =
operator and a value
or via the update()
method
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"}}
# Add Honda
auto_club['Honda'] = {"owners": ["Rebecca", "Louise", "Ted", "Sophie"]}
print(auto_club)
auto_club.update({"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"}})
print(auto_club)
You can add or change other values that is mapped to a key in the dictionary
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"]}}
# Add Honda model
auto_club['Honda']['model'] = "Civic"
print(auto_club['Honda'])
# Change the Honda model
auto_club['Honda']['model'] = "Prelude"
print(auto_club['Honda'])
To get a list of keys in the dictionary use the keys
. This will return an object called dict_keys
that will contain a list of the keys.
If you just want the raw list or a tuple returned, use a list()
or tuple()
function to convert it.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
print(auto_club.keys())
print(list(auto_club.keys()))
print(tuple(auto_club.keys()))
To delete a key-value pair use the pop()
method
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
auto_club.pop("Honda")
print(auto_club)
To delete an item in a keys value dictionary use the del()
function
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
del auto_club["Audi"]['model']
print(auto_club['Audi'])
You can delete the last key-value pair from a dictionary using the 'popitem()' method.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
auto_club.popitem()
print(auto_club)
To empty the dictionary, i.e. clear the dictionary of all key-value pairs you can use the clear()
method.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
auto_club.clear()
print(auto_club)
And to delete the dictionary completely, use the del()
function on the dictionary itself.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
del auto_club
print(auto_club)
The above print statement will result in an error NameError: name 'auto_club' is not defined
Just like lists, to copy a dictionary you use the copy
method and NOT the equals operator =
. The copy
method will make a new copy that when modified will not affect the original.
auto_club = {"Audi": {"owners":["Jeff", "Mary", "Bob"], "model": "Quattro"},
"Land Rover": {"owners":["Richard", "Hank"], "model": "Range Rover"},
"Honda": {"owners":["Rebecca", "Louise", "Ted", "Sophie"], "model": "Prelude"}}
new_auto_club = auto_club.copy()
print(new_auto_club)
Sets
Sets are unordered iterable data structures that can hold multiple types of unique and immutable items in a single variable. Once an item is in a set it cannot be changed, it is immutable. You can however, add and delete from a set.
Sets are often used to create a unique set of data from a data structure such as an array or list. Thus, Sets are useful to create sets of unique data that will not change during the execution of a program. Another big advantage of using Sets is speed. If you have large settled data sets, then using Sets to access that data will definitely speed up your program. Sets are often used in data science specifically because the data sets are large and speed is imperative.
set1 = {1,4,7,9,0,3,6,5,2,8,1,9,9}
set2 = set(["Richard", "Tayfun", "Richard", "Tayfun"])
print(set1)
print(set2)
When you run the code above in the Python console, you'll see that duplicates have been removed.
Operating on Sets
You can add to a set using the add()
method
set1 = set(["Richard", "Tayfun", "Richard", "Tayfun"])
set1.add('Pablo')
print(set1)
You can delete from a set by using the discard()
or the remove()
methods. Using dicard()
over the remove()
method is preferred if you do not want to
get an error if the item you are attempting to remove from the set is not actually in the set. Using remove()
will cause an error if the item is not in the set.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set1.remove('Tayfun')
print(set1)
Using the remove()
method
set1 = {'Tayfun', 'Richard', 'Pablo'}
set1.remove('Tayfun')
print(set1)
You can use the pop()
method to remove a random element from a set, but why you would want to do that is another matter.
Using this method will return the item removed.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set1.pop()
print(set1)
To empty the set you can use the clear()
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set1.clear()
print(set1)
Natively, Sets do not maintain an index related to where each item is, so you cannot ask for a Set item at index 3 as you can with a list. However, you can find out if an item is in a set quite efficiently. This is because sets are hashable.
For an object to be hashable it has to be immutable. A hash is a unique id that is created by applying a hashing algorithm to an object, be that object a data structure, an individual data item or even a function. As long as it keeps the same value for as long as the program executes it can be hashed. Hashing allows items in sets to be compared and referenced quickly. In fact, it is the hashing that makes sets very fast when asking if they contain a specific data item.
You can use the expression 'value in set' to see if a value is part of a set.
set1 = {'Tayfun', 'Richard', 'Pablo'}
print("Richard" in set1)
Sets are pretty useful for acquiring differences between sets of data - think of this like computational Venn diagrams.
If you have more than one set and need to find out what common values each set has you can use the intersection
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
print(set1.intersection(set2))
To find out the differences between a set with another use the difference
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
# Prints the values in set1 that are not in set2
print(set1.difference(set2))
# Prints the values in set2 that are not in set3
print(set2.difference(set1))
Removing values from a set that are not present in another set can be achieved by using the intersection_update
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
# Removes the values in set1 that are not in set2
set1.intersection_update(set2)
print("set 1:", set1)
print("set 1:", set2)
Visa versa, removing values from a set that are present in another set can be achieved by using the difference_update
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
# Removes the values in set1 that are in set2
set1.difference_update(set2)
print("set 1:", set1)
print("set 1:", set2)
To create a set with the differences between two sets use the symmetric_difference
method
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
# Removes the values in set1 that are in set2
set3 = set1.symmetric_difference(set2)
print(set3)
To remove values that are shared and add values from another set use the symmetric_difference_update
method
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
set1.symmetric_difference_update(set2)
print(set1)
To see if another set contains a set, you can use the issubset
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
print(set1.issubset(set2))
set1 = {'Tayfun', 'Richard','Pablo'}
set2 = {'Pablo', 'Tayfun', 'Richard','Chris', 'Harry'}
print(set1.issubset(set2))
Note the second set 'set2' is tested for inclusion of 'set1'. Values do not have to be in the same order.
To see if a set contains all the values of another set you can use the issuperbset
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
print(set1.issubset(set2))
set1 = {'Tayfun', 'Richard','Pablo'}
set2 = {'Pablo', 'Tayfun', 'Richard','Chris', 'Harry'}
print(set1.issubset(set2))
To join sets together there are a couple of methods. The first, the union
method will join two sets into a third set. The second, the update
method, adds the second set to the first set
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
# First method for joining sets, uses a third set
set3 = set1.union(set2)
print(set3)
# Second method adds set2 to set1
set1.update(set2)
print(set1)
Remember, set values are unique so duplicates are removed when joining sets.
And finally if you wish to check if two sets are completely different use the isdisjoint
method.
set1 = {'Tayfun', 'Richard', 'Pablo'}
set2 = {'Pablo', 'Chris', 'Harry'}
print(set1.isdisjoint(set2))
set2 = {'Chris', 'Harry'}
print(set1.isdisjoint(set2))
Some notes on memory consumption. Tuples are more efficient than Lists more efficient than Sets which are more efficient than dictionaries. If your data is immutable, choose Tuples.
That's it for this section, you can move on to the next part of this tutorial.